200716 Expt.311: RNP Purification via OdT and Silane Capture

NB: This demonstration analysis was performed on unpublished development data. This experiment was designed to investigate the effect of a number of protocol changes (which were subsequently deemed to underperform) and explore the use of total inputs as a quantitative baseline for discriminating between RBPs more closely associated with non-adenylated RNA than adenylated RNA.

Aims:
A. To test the following modifications to the RNA-RBP Purification protocol
i) Dropping of NaOAc
NaOAc is typically a precipitative agent. It shouldn't be necessary for OdT capture and, if the global capture truly occurs on the basis of salt bridge formation then it should not be necessary for the silane protocol either. Counter-evidence to the salt-bridging hypothesis includes the successful use of salt-free washes prior to elution from silane, and the role of heating in aiding elution.
ii) Dropping of the high pH spike-in (previously delivered as pH10.8 NaOAc)
Use of a pH10.8 NaOAc spike-in was initially used to increase the pH of acidic cocktails to pH7.8. Extensive investigation of RNA degradation has ruled out all components except this one- thus because RNA should be able to withstand pH 7.8, it is likely that the original pH meter measurements of the balanced phenol cocktail were not accurate. In addition, the assition of a DNAse rxn buffer post lysis should be adequate to balance the acidic, but weakly buffered, phenol cocktail stock to above pH7; this has been confirmed by bromocresol green.
iii) Substantively increased neat formamide and tris-buffer wash times for the silane prep
Initial silane experiments have tolerated high degrees of salt-free washing. We are simply revisting this as a convenience aspect of the protocol.
iv) Use of RNAse prior to tryptic digest and increase in both trypsin amount and concentration.
Some past experiments have returned higher than expected missed cleavages. Is it due to physical obstruction by RNA or inadequate trypsin? To find an expected limit under the most favourable conditions RNAse has been used prior to overnight tryptic digest, trypsin increased to 0.5ug for 2e6 cells RBP equivalent, and digestion volume reduced to 50ul (from 100ul). We seek missed cleavage rates to be similar to that of the total inputs (100k cells whole proteome, 1ug trypsin) and for those total input samples to show a missed cleavage rate comparable to previous similar experiments (to confirm there are no batch issues with the new enzyme).

B. To explore the use of total inputs as a quantitative baseline for discriminating between RBPs more closely associated with non-adenylated RNA than adenylated RNA
See associated writeup for Expt.311 for a primer on how this is proprosed (ultimately, for this experiment, the underperformance of the silane groups precludes this analysis from being done properly).

Method:
See associated writeup for Expt.311

1. Import Modules and Files

Custom Functions
jwrangle.importMixedFiles( )

I generally import everything I MIGHT use at the start and set up pathing using the OS-agnostic pathlib.

In [2]:
#### File utilities
import os
import pandas as pd
from pathlib import Path
from imp import reload

#### Data Wrangling
import copy
import numpy as np

#### RBP Suite Modules
import jwrangle
import jvis
import jinspect
import jtest
import jweb

#### Sequence Tools
from Bio import SeqIO

#### Graphical Packages
import upsetplot as upset
import seaborn as sns
import matplotlib.pyplot as plt
import altair as alt

#### define working directories
cwd  = Path(os.getcwd())
base_path = Path(os.path.join(*cwd.parts[:cwd.parts.index('experiments')]))

#### MaxQuant proteinGroups & evidence files
MQ_folder = jwrangle.importMixedFiles(cwd / 'MaxQuant')
MQ_folder.keys()

pGroups = MQ_folder['proteinGroups.txt']
evidence = MQ_folder['evidence.txt']

#### Inspect MQ setup
MQ_folder['parameters.txt'].head(9)
C:\Users\smith.j\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3242: DtypeWarning: Columns (63,71) have mixed types.Specify dtype option on import or set low_memory=False.
  if (await self.run_code(code, result,  async_=asy)):
C:\Users\smith.j\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3242: DtypeWarning: Columns (355) have mixed types.Specify dtype option on import or set low_memory=False.
  if (await self.run_code(code, result,  async_=asy)):
Out[2]:
Parameter Value
0 Version 1.6.7.0
1 User name smith.j
2 Machine name MSCYPHER-253
3 Date of writing 07/17/2020 13:51:16
4 Include contaminants True
5 PSM FDR 0.01
6 PSM FDR Crosslink 0.01
7 Protein FDR 0.01
8 Site FDR 0.01

2. Metadata Creation

Custom Functions
jwrangle.MQ_writeMetadata( )

Metadata tabulates the test conditions for ALL experiments that shared the same MQ search and thus all experiments that comprise the MQ outputs. Metadata can also be done in a spreadsheet program.
The metadata table gives users the opportunity to rename samples and define the experimental parameters for the data. This task can be expecially complex for MaxQuant because a unified output is generated even if distinctly separate experiments are searched as a batch and with different parameters applied. The function jwrangle.MQ_writeMetadata( ) will take a metadata table, rename all samples in the proteinGroups and evidence files, assign alternative filenames, and save new copies to be used in future analyses.

In [3]:
#### Inspect column names
colnames = list(pGroups.columns.values)
In [4]:
#### Derive experiment names as a list
experiment_names = []
for i in colnames:
    if 'Intensity ' in i:
        experiment_names.append(i.replace('Intensity ', ''))
In [4]:
#### Create a list of associated conditions
condition =  ['TI_1ul_nCL']*6 + ['TI_5ul_254']*6 + ['OdT_nCL']*6 + ['OdT_254']*6 + ['Sil_nCL']*6 + ['Sil_254']*6
In [5]:
#### Create a list of associated replicate identifiers
replicate = ['A','B','C','D','E','F']*6
In [8]:
#### Create a more reader friendly list of sample names
samples = ['01_TI_1ul_nCL_A', '02_TI_1ul_nCL_B', '03_TI_1ul_nCL_C', '04_TI_1ul_nCL_D', '05_TI_1ul_nCL_E', '06_TI_1ul_nCL_F', 
           '07_TI_5ul_254_A', '08_TI_5ul_254_B', '09_TI_5ul_254_C', '10_TI_5ul_254_D', '11_TI_5ul_254_E', '12_TI_5ul_254_F', 
           '13_OdT_nCL_A', '14_OdT_nCL_B', '15_OdT_nCL_C', '16_OdT_nCL_D', '17_OdT_nCL_E', '18_OdT_nCL_F', 
           '19_OdT_254_A', '20_OdT_254_B', '21_OdT_254_C', '22_OdT_254_D', '23_OdT_254_E', '24_OdT_254_F', 
           '25_Sil_nCL_A', '26_Sil_nCL_B', '27_Sil_nCL_C', '28_Sil_nCL_D', '29_Sil_nCL_E', '30_Sil_nCL_F', 
           '31_Sil_254_A', '32_Sil_254_B', '33_Sil_254_C', '34_Sil_254_D', '35_Sil_254_E', '36_Sil_254_F']
In [9]:
#### Define the experiment group each sample belongs to.
MQ_groups = ['TI_lo']*6 + ['TI_hi']*6 + ['OdT']*12 + ['Sil']*12
In [10]:
#### Create metadata dataframe and inspect
expt_df = pd.DataFrame(
    {'experiment': experiment_names,
     'condition': condition,
     'replicate': replicate,
     'sample':samples,
     'measure':['Intensity']*len(samples),                  # adding this column allows our metadata file to be compatible with Proteus
     'MQgroups':MQ_groups
    })

expt_df
Out[10]:
experiment condition replicate sample measure MQgroups
0 01_Slot1-1_1_3210 TI_1ul_nCL A 01_TI_1ul_nCL_A Intensity TI_lo
1 02_Slot1-1_1_3211 TI_1ul_nCL B 02_TI_1ul_nCL_B Intensity TI_lo
2 03_Slot1-1_1_3212 TI_1ul_nCL C 03_TI_1ul_nCL_C Intensity TI_lo
3 04_Slot1-1_1_3213 TI_1ul_nCL D 04_TI_1ul_nCL_D Intensity TI_lo
4 05_Slot1-1_1_3214 TI_1ul_nCL E 05_TI_1ul_nCL_E Intensity TI_lo
5 06_Slot1-1_1_3215 TI_1ul_nCL F 06_TI_1ul_nCL_F Intensity TI_lo
6 07_Slot1-1_1_3217 TI_5ul_254 A 07_TI_5ul_254_A Intensity TI_hi
7 08_Slot1-1_1_3218 TI_5ul_254 B 08_TI_5ul_254_B Intensity TI_hi
8 09_Slot1-1_1_3219 TI_5ul_254 C 09_TI_5ul_254_C Intensity TI_hi
9 10_Slot1-1_1_3220 TI_5ul_254 D 10_TI_5ul_254_D Intensity TI_hi
10 11_Slot1-1_1_3221 TI_5ul_254 E 11_TI_5ul_254_E Intensity TI_hi
11 12_Slot1-1_1_3222 TI_5ul_254 F 12_TI_5ul_254_F Intensity TI_hi
12 13_Slot1-1_1_3196 OdT_nCL A 13_OdT_nCL_A Intensity OdT
13 14_Slot1-1_1_3197 OdT_nCL B 14_OdT_nCL_B Intensity OdT
14 15_Slot1-1_1_3198 OdT_nCL C 15_OdT_nCL_C Intensity OdT
15 16_Slot1-1_1_3199 OdT_nCL D 16_OdT_nCL_D Intensity OdT
16 17_Slot1-1_1_3200 OdT_nCL E 17_OdT_nCL_E Intensity OdT
17 18_Slot1-1_1_3201 OdT_nCL F 18_OdT_nCL_F Intensity OdT
18 19_Slot1-1_1_3203 OdT_254 A 19_OdT_254_A Intensity OdT
19 20_Slot1-1_1_3204 OdT_254 B 20_OdT_254_B Intensity OdT
20 21_Slot1-1_1_3205 OdT_254 C 21_OdT_254_C Intensity OdT
21 22_Slot1-1_1_3206 OdT_254 D 22_OdT_254_D Intensity OdT
22 23_Slot1-1_1_3207 OdT_254 E 23_OdT_254_E Intensity OdT
23 24_Slot1-1_1_3208 OdT_254 F 24_OdT_254_F Intensity OdT
24 25_Slot1-1_1_3159 Sil_nCL A 25_Sil_nCL_A Intensity Sil
25 26_Slot1-1_1_3158 Sil_nCL B 26_Sil_nCL_B Intensity Sil
26 27_Slot1-1_1_3160 Sil_nCL C 27_Sil_nCL_C Intensity Sil
27 28_Slot1-1_1_3161 Sil_nCL D 28_Sil_nCL_D Intensity Sil
28 29_Slot1-1_1_3162 Sil_nCL E 29_Sil_nCL_E Intensity Sil
29 30_Slot1-1_1_3163 Sil_nCL F 30_Sil_nCL_F Intensity Sil
30 31_Slot1-1_1_3170 Sil_254 A 31_Sil_254_A Intensity Sil
31 32_Slot1-1_1_3165 Sil_254 B 32_Sil_254_B Intensity Sil
32 33_Slot1-1_1_3166 Sil_254 C 33_Sil_254_C Intensity Sil
33 34_Slot1-1_1_3167 Sil_254 D 34_Sil_254_D Intensity Sil
34 35_Slot1-1_1_3168 Sil_254 E 35_Sil_254_E Intensity Sil
35 36_Slot1-1_1_3169 Sil_254 F 36_Sil_254_F Intensity Sil
In [11]:
# MQ_expt298 = jwrangle.MQ_writeMetadata(pGroups, evidence, expt_df, 'e311', cwd)
metadata.csv created in metadata folder
e311_proteinGroups_metalabeled.txt created in MaxQuant folder
e311_evidence_metalabeled.txt created in MaxQuant folder

3. Re-Annotate the MaxQuant pGroups with stable Gene IDs

Functions
jweb.mapAnyID( )
jwrangle.importMixedFiles( )

MaxQuant does a good job of assigning a Gene name to each protein group. Presumably these gene names come from the FASTA. However:

  • Sometimes it fails to find a gene name
  • Sometimes it will assign an ID that is not a gene or include out-of-place characters
  • It doesn't always seem to be consistent
  • If the gene name originates from the FASTA then repeating the MQ search with an updated FASTA is the only way to update the gene IDs.
  • Use of a mapping service will standardise the ID conversion practices between my datasets and those of others, including RNA-Seq.

To avoid these problems we will remap the Majority protein IDs to ENTREZ gene IDs. jweb.mapAnyID( ) will retrieve all possible genes for each protein group, and will also select a primary ID to singularly represent the group by a consistent method. This is a very flexible function, see help( ) for further explanation. From this point, the MQ 'Gene names' column will no longer be necessary. This function can also handle ID mapping to and from almost any convention.

Ensuring our proteins have a consistent gene naming strategy is essential for inter-experiment comparison and the later use of set methods. It also creates a standard that can be applied for accurately mapping RNA-Seq results and thus aid in future mapping of protein-RNA partners.

In [2]:
#### If not already loaded, read in the metadata-adjusted files
metadata = pd.read_csv(cwd / 'metadata' / 'e311_metadata.csv', index_col = 0)
pGroups = pd.read_csv(cwd / 'MaxQuant' / 'e311_proteinGroups_metalabeled.txt', delimiter = '\t')
evidence = pd.read_csv(cwd / 'MaxQuant' / 'e311_evidence_metalabeled.txt', delimiter = '\t')
C:\Users\smith.j\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3051: DtypeWarning: Columns (355) have mixed types.Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
C:\Users\smith.j\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3051: DtypeWarning: Columns (63,71) have mixed types.Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
In [3]:
#### Dynamically remap gene names in our proteinGroups file and save a copy
# pGroups_remap = jweb.mapAnyID_gPro(pGroups['Majority protein IDs'].tolist(), splitstr = [';', '-'], geneProductType = 'protein', 
#                               gConvertOrganism = 'hsapiens', gConvertTarget = 'ENTREZGENE', writetopath = [cwd, 'pGroups_remap'], writeTargetsAsList = 'NO')
In [4]:
#### If not already loaded, read in the remapped proteinGroups file
pGroups_remap = jwrangle.importMixedFiles(cwd / 'downloads' / 'pGroups_remap', dropSuffix = 'yes')
pGroups_remap.keys()
Out[4]:
dict_keys(['id_map', 'query_map'])
In [5]:
#### jwrangle.importMixedFiles() returns a dictionary where keys = files. We want the 'id_map' table created by jweb.mapAnyID_gPro().
#### We'll rename the Query column and drop duplicates so the table can be merged with our proteinGroups table.
id_map = pGroups_remap['id_map'].rename(columns={'Query': 'Majority protein IDs'}).drop_duplicates()
id_map.head(2)
Out[5]:
Majority protein IDs ENTREZGENE_gPro all ENTREZGENE_gPro primary ENTREZGENE_gPro name UNIPROT_gPro status
0 K7EK40;K7EMD8;K7ERM9;A0A024QZ33;Q9H0G5 NSRP1 NSRP1 nuclear speckle splicing regulatory protein 1 ... SWISSPROT
1 A0A087X1N8;A0A024QZX5;P35237 SERPINB6 SERPINB6 serpin family B member 6 [Source:HGNC Symbol;A... SWISSPROT
In [6]:
#### Now use merge to add these new columns to our proteinGroups table
pGroups_map = pd.merge(pGroups, id_map, on='Majority protein IDs', how='left')

#### Check the tables are merged by viewing column elements from each.
pGroups_map[id_map.columns.tolist() + ['Peptide IDs']].head(2)
Out[6]:
Majority protein IDs ENTREZGENE_gPro all ENTREZGENE_gPro primary ENTREZGENE_gPro name UNIPROT_gPro status Peptide IDs
0 K7EK40;K7EMD8;K7ERM9;A0A024QZ33;Q9H0G5 NSRP1 NSRP1 nuclear speckle splicing regulatory protein 1 ... SWISSPROT 2252
1 A0A087X1N8;A0A024QZX5;P35237 SERPINB6 SERPINB6 serpin family B member 6 [Source:HGNC Symbol;A... SWISSPROT 792;11854;16461;18626;21238;26572;26894;35209;...

4. Review Contaminants by Sample

Functions
jinspect.MQ_getContaminants( )
MQ_getContaminants_sbplot( )
jwrangle.importMixedFiles( )

We can extract the conaminants from our proteinGroups file using jinspect.MQ_getContaminants( ). These extracted table will return log2(iBAQ values).
Contaminants can then be reviewed with _MQ_getContaminantssbplot( ).

In [7]:
#### Extract contaminants
contaminants = jinspect.MQ_getContaminants(pGroups_map, metadata)
contaminants.head(2)
Out[7]:
01_TI_1ul_nCL_A 02_TI_1ul_nCL_B 03_TI_1ul_nCL_C 04_TI_1ul_nCL_D 05_TI_1ul_nCL_E 06_TI_1ul_nCL_F 07_TI_5ul_254_A 08_TI_5ul_254_B 09_TI_5ul_254_C 10_TI_5ul_254_D ... 27_Sil_nCL_C 28_Sil_nCL_D 29_Sil_nCL_E 30_Sil_nCL_F 31_Sil_254_A 32_Sil_254_B 33_Sil_254_C 34_Sil_254_D 35_Sil_254_E 36_Sil_254_F
Protein ID: Gene
A0A0A0MT01: GSN 0.0 0.0 0.0 0.0 7.666686 0.0 12.531235 10.266318 10.818822 9.303073 ... 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSBTAP00000007350: nan 0.0 0.0 0.0 0.0 0.000000 0.0 11.759805 0.000000 7.265756 4.590482 ... 0.0 0.0 0.0 0.0 7.407608 6.615151 6.109632 5.486071 8.240219 7.511436

2 rows × 36 columns

In [8]:
#### Visaully inspect contaminants  
jvis.MQ_getContaminants_sbplot(contaminants, metadata, width = 1, length = 2, layout = 'single')
<Figure size 432x288 with 0 Axes>

Results: Dirtier than usual

I expect this is a result of using RNAses T1 and A from a non-HPLC purified source.
The presence of streptavidin in the total input samples is puzzling.
RNAses were introduced to investigate the improvement of digestion efficiency. They are unlikely to be used routinely because they introduce background signal to the nCL groups.

5. Assess Digestion Efficiency

Functions
jinspect.MQ_getMissedCleavages( )
jvis.CommonPalettesAsHex
jvis.BarPlotByGroup_sbplot( )

Assessing missed cleavages is an essential metric for understanding the quality of the tryptic digestion. This data is recorded in the evidence file.
_jinspect.MQgetMissedCleavages( ) will return a long form data table that can easily be used for plotting.
The dictionary jvis.CommonPalettesAsHex contains a number of palettes that are common to both matplotlib and ggplot (from R). These are provided to ensure consistency is easy to achieve across both languages.
We'll plot the missed cleavages with the generic function jvis.BarPlotByGroup_sbplot( )

In [9]:
#### Extract the missed cleavage data into a long form table for plotting
MissedCleavages = jinspect.MQ_getMissedCleavages(evidence, metadata, drop_contaminants = True)
MissedCleavages.sort_values(by=['sample'], inplace = True)
MissedCleavages
Out[9]:
sample % Missed Cleavages group expt
2 01_TI_1ul_nCL_A 9 TI_1ul_nCL TI_lo
17 02_TI_1ul_nCL_B 9 TI_1ul_nCL TI_lo
24 03_TI_1ul_nCL_C 9 TI_1ul_nCL TI_lo
35 04_TI_1ul_nCL_D 8 TI_1ul_nCL TI_lo
5 05_TI_1ul_nCL_E 9 TI_1ul_nCL TI_lo
28 06_TI_1ul_nCL_F 8 TI_1ul_nCL TI_lo
29 07_TI_5ul_254_A 12 TI_5ul_254 TI_hi
21 08_TI_5ul_254_B 12 TI_5ul_254 TI_hi
22 09_TI_5ul_254_C 12 TI_5ul_254 TI_hi
1 10_TI_5ul_254_D 12 TI_5ul_254 TI_hi
20 11_TI_5ul_254_E 12 TI_5ul_254 TI_hi
34 12_TI_5ul_254_F 12 TI_5ul_254 TI_hi
7 13_OdT_nCL_A 5 OdT_nCL OdT
18 14_OdT_nCL_B 4 OdT_nCL OdT
26 15_OdT_nCL_C 4 OdT_nCL OdT
6 16_OdT_nCL_D 4 OdT_nCL OdT
0 17_OdT_nCL_E 4 OdT_nCL OdT
4 18_OdT_nCL_F 4 OdT_nCL OdT
15 19_OdT_254_A 13 OdT_254 OdT
23 20_OdT_254_B 12 OdT_254 OdT
3 21_OdT_254_C 12 OdT_254 OdT
31 22_OdT_254_D 12 OdT_254 OdT
9 23_OdT_254_E 12 OdT_254 OdT
19 24_OdT_254_F 13 OdT_254 OdT
10 25_Sil_nCL_A 5 Sil_nCL Sil
33 26_Sil_nCL_B 3 Sil_nCL Sil
32 27_Sil_nCL_C 8 Sil_nCL Sil
12 28_Sil_nCL_D 5 Sil_nCL Sil
25 29_Sil_nCL_E 6 Sil_nCL Sil
11 30_Sil_nCL_F 24 Sil_nCL Sil
27 31_Sil_254_A 15 Sil_254 Sil
8 32_Sil_254_B 15 Sil_254 Sil
14 33_Sil_254_C 15 Sil_254 Sil
30 34_Sil_254_D 15 Sil_254 Sil
16 35_Sil_254_E 14 Sil_254 Sil
13 36_Sil_254_F 14 Sil_254 Sil
In [10]:
### Select a colour palette  
cpal = jvis.CommonPalettesAsHex

set2_paired = []
for i in cpal['Set2_qual']:
    set2_paired.append(i)
    set2_paired.append(i)

#### Plot the grouped data points  
sns.set_style('whitegrid')
jvis.BarPlotByGroup_sbplot(MissedCleavages, x_col = 'group', y_col = '% Missed Cleavages', title = '% Missed Cleavages', pal = set2_paired)
<Figure size 432x288 with 0 Axes>

Results: Good

Good digestion efficiency might be the individual or combined result of

  • prior RNAse treatment (usually not done)
  • Trypsin increased to 0.5ug (usually 0.2ug)
  • Smaller digestion volumes (50ul)
  • Less protein (only 100k cells for TI, and RBP from 2e6 cells for Sil/OdT)

Moving forward, I think we can assume there is a great deal of protein
Next time increase to 1ug Trypsin per 10e6 cells in 50ul without RNAse pretreatment, maintain 0.5ug for RBP extgracted from 5e6 or fewer cells

6. Remove Contaminants

Functions
jwrangle.MQ_getThreePassFilter( )
SeqIO.parse( )

After QC we no longer want the contaminants in our data. jwrangle.MQ_getThreePassFilter( ) will remove reverse peptides, contaminants, and only identified by site from MQ tables.
The filter will also accept customised exclusion lists in case users have added odd protein species to the search FASTA tables. In this particular experiment we added to the human FASTA, RNAse proteins and the large T antigen. The former as 1) a check that dynamic range is not being overwhelmed and 2) as an quantitative spike-in control to compare tryptic efficiency and the sample recovery across samples following C18 cleanup.

In [11]:
#### Map the location of the custom FASTA elements
os.listdir(base_path / 'my_resources' / 'FASTA')  

#### Create a list of the non-human proteins that were added to the custom FASTA genome search. 
new_cont = []
with open(base_path / 'my_resources' / 'FASTA' / "custom_proteome_elements.fasta", "r") as handle:
    for record in SeqIO.parse(handle, "fasta"):
        new_cont.append(record.id.split('|')[1])

#### Remove all unwanted contaminants and IDs from the proteinGroups table      
pGroup_clean = jwrangle.MQ_getThreePassFilter(pGroups_map, custom_exclusion = new_cont)

#### Inspect the cleaned dataframe
pGroup_clean[['ENTREZGENE_gPro primary'] + [i for i in pGroup_clean.columns if 'iBAQ' in i]].head(2)
Out[11]:
ENTREZGENE_gPro primary iBAQ iBAQ 01_TI_1ul_nCL_A iBAQ 02_TI_1ul_nCL_B iBAQ 03_TI_1ul_nCL_C iBAQ 04_TI_1ul_nCL_D iBAQ 05_TI_1ul_nCL_E iBAQ 06_TI_1ul_nCL_F iBAQ 07_TI_5ul_254_A iBAQ 08_TI_5ul_254_B ... iBAQ 27_Sil_nCL_C iBAQ 28_Sil_nCL_D iBAQ 29_Sil_nCL_E iBAQ 30_Sil_nCL_F iBAQ 31_Sil_254_A iBAQ 32_Sil_254_B iBAQ 33_Sil_254_C iBAQ 34_Sil_254_D iBAQ 35_Sil_254_E iBAQ 36_Sil_254_F
0 NSRP1 26022.0 0.00 0.00 0.00 0.0 0.00 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 SERPINB6 28624.0 669.74 775.78 200.28 0.0 276.37 0.0 7852.6 1964.5 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

2 rows × 38 columns

7. Drop Gene Duplicates and Filter Intensities by LFQ

Functions
jinspect.MQ_dropDuplicateIDs( )

The next step focuses on improving confidence in the quality of our data. This is done by applying jinspect.MQ_dropDuplicateIDs( ) which has the below effects:

  • Because one gene can have many proteins, sometimes Maxquant will create multiple proteinGroups for a single gene. As most of our analysis focuses on genes we'll trim the lowest quality proteinGroups duplicates from the table.
  • Standard LFQ defaults require a minimum of 2 peptide species, at least one of which must be unique, for quantitation to be applied. Intensity and iBAQ values, however, do not have such a minimum limit. I consider a 2 peptide minimum to be a wise filter but still have use for the Intensity and iBAQ values. Thus where the LFQ filter is applied all measurements that do not meet the minimum limit will be discarded. In short, if there isn't a companion LFQ value, there won't be an Intensity or iBAQ value either after filtering.
  • It has been documented that Match Between Runs suffers a high frequency of false peptide transfers (Lim, Paulo, Gygi 2019; doi: 10.1021/acs.jproteome.9b00492). At the protein level, however, this false transfer rate is greatly mitigated by the minimum peptide rule applied by the LFQ algorithm. This is another good reason for our filtering step.
In [12]:
#### Drop duplicates and apply LFQ filter
filter_dict = jinspect.MQ_dropDuplicateIDs(pGroup_clean, metadata, prefix = 'Peptides', ID = 'ENTREZGENE_gPro primary', pool = 'measure', drop_ID = 'None', 
                                            keep_PoolCalcs = False, applyLFQ_filter = ['Intensity', 'iBAQ'])
#### Inspect filter dictionary
filter_dict.keys()
WARNING: jinspect.MQ_getMeasuredMeansByGroup() has not been checked
Out[12]:
dict_keys(['df_keep', 'df_droprows'])
In [13]:
#### The df_keep value contains our targets, df_droprows conatins the discarded duplicates. Assign the df_keep value to a new variable and inspect.
pGroup_filtered = filter_dict['df_keep']
pGroup_filtered.head(2)
Out[13]:
Protein IDs Majority protein IDs Peptide counts (all) Peptide counts (razor+unique) Peptide counts (unique) Protein names Gene names Fasta headers Number of proteins Peptides ... iBAQ 27_Sil_nCL_C iBAQ 28_Sil_nCL_D iBAQ 29_Sil_nCL_E iBAQ 30_Sil_nCL_F iBAQ 31_Sil_254_A iBAQ 32_Sil_254_B iBAQ 33_Sil_254_C iBAQ 34_Sil_254_D iBAQ 35_Sil_254_E iBAQ 36_Sil_254_F
0 K7EK40;K7EMD8;K7ERM9;A0A024QZ33;Q9H0G5 K7EK40;K7EMD8;K7ERM9;A0A024QZ33;Q9H0G5 1;1;1;1;1 1;1;1;1;1 1;1;1;1;1 Nuclear speckle splicing regulatory protein 1 NSRP1;CCDC55 ;;;; 5 1 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 A0A087X1N8;A0A024QZX5;P35237 A0A087X1N8;A0A024QZX5;P35237 10;10;10 10;10;10 10;10;10 Serpin B6 SERPINB6 ;; 3 10 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

2 rows × 370 columns

8. Review Sample Clustering by Group

Functions
jtest.getDistanceMatrix( )
jvis.MQ_showDendrogramQC_mplplot( )

A distance matrix function jtest.getDistanceMatrix( ) is provided for users who wish to apply different algorithms or create different visualisations.
I like the 'ward' method for distance calculations and using a dengrogram to confirm that clustering matches expectations and so use a prerolled function jvis.MQ_showDendrogramQC_mplplot( )

In [14]:
#### Confirm that clustering matches expectations
jvis.MQ_showDendrogramQC_mplplot(pGroup_filtered, 'LFQ intensity', metadata, 'QC clustering: ', grid = 'YES', fsize = (8, 8))

Result: GOOD

Clustering reveals expected results

9. Analyse Normalisation Effects by Sample

Functions
jwrangle.Log2_ByPrefix( )
jwrangle.MQ_poolMulti( )
jvis.ViolinCompare_sbplot( )

Here we review normalisation effects on each sample within the condition groups; these are most easily interpreted after log2 transformation. We will transform all measures of interest with _jwrangle.Log2ByPrefix( ) and then pool all the values of interest, by condition, with _jwrangle.MQpoolMulti( ). The function _jvis.ViolinComparesbplot( ) will let use compare Intensity distribution on a per sample basis.

Normalisation is applied to LFQ values by MaxQuant and is a feature of its handling of label-free data. I've not seen a detailed explanation of how it works though so it is a leap of faith that Cox and Mann have selected an appropriate method.
Normalisation must be applied separately to nCL and cCL groups. This is unusal though necessary to avoid outrageous results. See expt313 for evidence.

In [15]:
#### Log2 transform available intensity values.
pGroup_log2 = jwrangle.Log2_ByPrefix(pGroup_filtered, 'LFQ intensity')
pGroup_log2 = jwrangle.Log2_ByPrefix(pGroup_log2, 'iBAQ')
pGroup_log2 = jwrangle.Log2_ByPrefix(pGroup_log2, 'Intensity')
pGroup_log2.replace(0,np.nan, inplace=True)
In [16]:
#### Create a long form dataset for each desired grouping
pool_SampleIntensity = jwrangle.MQ_poolMulti(pGroup_log2, metadata, melt_list = ['Intensity', 'LFQ intensity'], group = 'condition')
pool_SampleIntensity.keys()
Out[16]:
dict_keys(['TI_1ul_nCL', 'TI_5ul_254', 'OdT_nCL', 'OdT_254', 'Sil_nCL', 'Sil_254'])
In [17]:
#### Inspect the Intensity shifts generated by the LFQ normalisation algorithm. In MaxQuant Raw Intensity and iBAQ values are 
#### not subjected to the LFQ normalisation calculations so this is a good way to spot any gross violations
sns.set_style('whitegrid')
jvis.ViolinCompare_sbplot(pool_SampleIntensity['TI_1ul_nCL'], title = 'TI_1ul_nCL: Normalisation Effects', ylabel = 'Log2(Intensity)', palette = ['#ff6666', '#99ccff'])
In [18]:
jvis.ViolinCompare_sbplot(pool_SampleIntensity['TI_5ul_254'], title = 'TI_5ul_254: Normalisation Effects', ylabel = 'Log2(Intensity)', palette = ['#ff6666', '#99ccff'])
In [19]:
jvis.ViolinCompare_sbplot(pool_SampleIntensity['OdT_nCL'], title = 'OdT_nCL: Normalisation Effects', ylabel = 'Log2(Intensity)', palette = ['#ff6666', '#99ccff'])
In [20]:
jvis.ViolinCompare_sbplot(pool_SampleIntensity['OdT_254'], title = 'OdT_254: Normalisation Effects', ylabel = 'Log2(Intensity)', palette = cpal['Set3_qual'])
In [21]:
jvis.ViolinCompare_sbplot(pool_SampleIntensity['Sil_nCL'], title = 'Sil_nCL: Normalisation Effects', ylabel = 'Log2(Intensity)', palette = cpal['Set2_qual'])
In [22]:
jvis.ViolinCompare_sbplot(pool_SampleIntensity['Sil_254'], title = 'Sil_254: Normalisation Effects', ylabel = 'Log2(Intensity)', palette = cpal['Set2_qual'])

Result: GOOD

No weird sample variations (except perhaps sil-ncl B/F
No outrageous normalisation adjustments

10. Compare Intensity Distribution and Sequence Coverage

Functions
jwrangle.MQ_poolDataByCondition( )
jvis.BoxPlotByColumn_sbplot( )

Next we will compare intensity and sequence coverage between groups. Log2 transformation has already been performed so we need only use jwrangle.MQ_poolDataByCondition( ) to create the appropriate long form dataset for plotting with jvis.BoxPlotByColumn_sbplot( ).

In [23]:
#### Pool data into a single long form dataset
pooled_dfDropGroupOne = jwrangle.MQ_poolDataByCondition(pGroup_log2, metadata, prefix_list = ['Intensity', 'Sequence coverage'])
In [24]:
#### Compare Intensity distribution using a box and whisker plot
sns.set_style('whitegrid')
jvis.BoxPlotByColumn_sbplot(pooled_dfDropGroupOne, 'Intensity: ', 'Intensity')
In [25]:
#### Compare Sequence coverage using a box and whisker plot
sns.set_style('whitegrid')
jvis.BoxPlotByColumn_sbplot(pooled_dfDropGroupOne, 'Sequence coverage: ', 'Sequence coverage %')

Result: GOOD

These results are consistent with expectations.
Previously I had theorised that the slightly lower sequence coverage typically found in RNP extracted samples could be due to excess RNA affecting tryptic efficiency- but the 1ul vs 5ul Total Input samples show that the sequence coverage is most likely being reduced only as a result of less material being present. A pattern that also holds true for nCL vs cCL comparisons.

11. Compare Sum Peptide Counts

Functions
jinspect.MQ_getSumBySample( )
jvis.BarPlotByGroup_sbplot( )

To sum the total peptides observed across all proteins use _jinspect.MQgetSumBySample( ). These sums will be returned as a modified metadata table.
Plotting these by group is easily done with jvis.BarPlotByGroup_sbplot( ). The plotting order is determined by the metadata ordering.
In this case we are inspecting the number of peptides detected after having removed contaminants- thus if some spike-in proteins were removed, i.e. in this case RNAse treatments, they will not contribute to the peptide count. To look at the replicability of these spike-ins, we would reach back to the 'df_droprows' table generated by jinspect.MQ_dropDuplicateIDs( ) in section 7.

In [26]:
#### Extract the total peptides observed per sample
metaStats = jinspect.MQ_getSumBySample(pGroup_log2, metadata, freqList = ['Peptides'], measure = False)
metaStats
Out[26]:
experiment condition replicate sample measure MQgroups Peptides
0 01_Slot1-1_1_3210 TI_1ul_nCL A 01_TI_1ul_nCL_A Intensity TI_lo 18767.0
1 02_Slot1-1_1_3211 TI_1ul_nCL B 02_TI_1ul_nCL_B Intensity TI_lo 19005.0
2 03_Slot1-1_1_3212 TI_1ul_nCL C 03_TI_1ul_nCL_C Intensity TI_lo 14938.0
3 04_Slot1-1_1_3213 TI_1ul_nCL D 04_TI_1ul_nCL_D Intensity TI_lo 11800.0
4 05_Slot1-1_1_3214 TI_1ul_nCL E 05_TI_1ul_nCL_E Intensity TI_lo 18314.0
5 06_Slot1-1_1_3215 TI_1ul_nCL F 06_TI_1ul_nCL_F Intensity TI_lo 9414.0
6 07_Slot1-1_1_3217 TI_5ul_254 A 07_TI_5ul_254_A Intensity TI_hi 40075.0
7 08_Slot1-1_1_3218 TI_5ul_254 B 08_TI_5ul_254_B Intensity TI_hi 33472.0
8 09_Slot1-1_1_3219 TI_5ul_254 C 09_TI_5ul_254_C Intensity TI_hi 34678.0
9 10_Slot1-1_1_3220 TI_5ul_254 D 10_TI_5ul_254_D Intensity TI_hi 33739.0
10 11_Slot1-1_1_3221 TI_5ul_254 E 11_TI_5ul_254_E Intensity TI_hi 37424.0
11 12_Slot1-1_1_3222 TI_5ul_254 F 12_TI_5ul_254_F Intensity TI_hi 34369.0
12 13_Slot1-1_1_3196 OdT_nCL A 13_OdT_nCL_A Intensity OdT 3294.0
13 14_Slot1-1_1_3197 OdT_nCL B 14_OdT_nCL_B Intensity OdT 2322.0
14 15_Slot1-1_1_3198 OdT_nCL C 15_OdT_nCL_C Intensity OdT 3309.0
15 16_Slot1-1_1_3199 OdT_nCL D 16_OdT_nCL_D Intensity OdT 2605.0
16 17_Slot1-1_1_3200 OdT_nCL E 17_OdT_nCL_E Intensity OdT 2818.0
17 18_Slot1-1_1_3201 OdT_nCL F 18_OdT_nCL_F Intensity OdT 2051.0
18 19_Slot1-1_1_3203 OdT_254 A 19_OdT_254_A Intensity OdT 18752.0
19 20_Slot1-1_1_3204 OdT_254 B 20_OdT_254_B Intensity OdT 21800.0
20 21_Slot1-1_1_3205 OdT_254 C 21_OdT_254_C Intensity OdT 19116.0
21 22_Slot1-1_1_3206 OdT_254 D 22_OdT_254_D Intensity OdT 19794.0
22 23_Slot1-1_1_3207 OdT_254 E 23_OdT_254_E Intensity OdT 22721.0
23 24_Slot1-1_1_3208 OdT_254 F 24_OdT_254_F Intensity OdT 21526.0
24 25_Slot1-1_1_3159 Sil_nCL A 25_Sil_nCL_A Intensity Sil 2082.0
25 26_Slot1-1_1_3158 Sil_nCL B 26_Sil_nCL_B Intensity Sil 348.0
26 27_Slot1-1_1_3160 Sil_nCL C 27_Sil_nCL_C Intensity Sil 2090.0
27 28_Slot1-1_1_3161 Sil_nCL D 28_Sil_nCL_D Intensity Sil 1578.0
28 29_Slot1-1_1_3162 Sil_nCL E 29_Sil_nCL_E Intensity Sil 1284.0
29 30_Slot1-1_1_3163 Sil_nCL F 30_Sil_nCL_F Intensity Sil 166.0
30 31_Slot1-1_1_3170 Sil_254 A 31_Sil_254_A Intensity Sil 11065.0
31 32_Slot1-1_1_3165 Sil_254 B 32_Sil_254_B Intensity Sil 10487.0
32 33_Slot1-1_1_3166 Sil_254 C 33_Sil_254_C Intensity Sil 10336.0
33 34_Slot1-1_1_3167 Sil_254 D 34_Sil_254_D Intensity Sil 7930.0
34 35_Slot1-1_1_3168 Sil_254 E 35_Sil_254_E Intensity Sil 13398.0
35 36_Slot1-1_1_3169 Sil_254 F 36_Sil_254_F Intensity Sil 11246.0
In [27]:
#### Plot the sum peptides
sns.set_style('whitegrid')
jvis.BarPlotByGroup_sbplot(metaStats, x_col = 'condition', y_col = 'Peptides', title = 'Sum Peptides vs Silica Capture', pal = set2_paired,
                          errorbars = 'SEM')
<Figure size 432x288 with 0 Axes>

Results: GOOD

Obviously the 5ul total input injections return far more peptides. But do they return more proteins?

12. Compare Unique Gene Counts

Functions
jinspect.MQ_getFrequencyBySample( )

One gene can encode for many proteins that often share regions of similarity. As for illumina-based RNA-Seq, however, shotgun proteomics can rarely assign a peptide species to a singular protein. In MaxQuant these are called proteinGroups. Because we have do not require protein-specific results, and gene identity is more stable, our gene count describes the groups to which our detected proteins have been be assigned. Thus gene here is being detected by protein product, just as it would be detected by RNA product in RNA Seq; none of these 3 are synonymous. To be clear, this is a count and not a measure.

Gene frequency is defined by the summed observations per protein regardless of intensity value and this data is extracted to our modified metadata with jinspect.MQ_getFrequencyBySample( ) .
A typical MQ search will yield identical protein counts (though different values) for Intensity and iBAQ*. LFQ frequencies will vary depending on the search settings:

  • In this case the MQ search has set LFQ values to be calculated on a min 2 peptide ratio (this is the default)**

Notes
* Why protein counts should be identical I don't know. The original iBAQ paper stipulates rules for the inclusion of a protein in the iBAQ calculation but MaxQuant doesn't seem to apply them.
** Previously I tested LFQ min ratio at 1 peptide. At 1 minimum peptide there was unexpected QC clustering. Possible explanations for this are explained in section 7 and are cleaned up by jinspect.MQ_dropDuplicateIDs( ) function. We can expect this function to greatly reduce qualifying IDs (~20% fewer), especially in the QE samples, but I think the trade-off is worth it because we gain 1) a more robust ID check and 2) the same search can be used for LFQ based checks of dynamic changes, i.e. comparing more than one group of cCL captures for biological changes.

In [28]:
#### Count the number of unique 
metaStats = jinspect.MQ_getFrequencyBySample(pGroup_log2, metaStats, freqList = ['Intensity', 'iBAQ', 'LFQ intensity'], measure = False)
metaStats
Out[28]:
experiment condition replicate sample measure MQgroups Peptides Intensity iBAQ LFQ intensity
0 01_Slot1-1_1_3210 TI_1ul_nCL A 01_TI_1ul_nCL_A Intensity TI_lo 18767.0 2449 2449 2449
1 02_Slot1-1_1_3211 TI_1ul_nCL B 02_TI_1ul_nCL_B Intensity TI_lo 19005.0 2517 2517 2517
2 03_Slot1-1_1_3212 TI_1ul_nCL C 03_TI_1ul_nCL_C Intensity TI_lo 14938.0 2074 2074 2074
3 04_Slot1-1_1_3213 TI_1ul_nCL D 04_TI_1ul_nCL_D Intensity TI_lo 11800.0 1675 1675 1675
4 05_Slot1-1_1_3214 TI_1ul_nCL E 05_TI_1ul_nCL_E Intensity TI_lo 18314.0 2825 2825 2825
5 06_Slot1-1_1_3215 TI_1ul_nCL F 06_TI_1ul_nCL_F Intensity TI_lo 9414.0 1463 1463 1463
6 07_Slot1-1_1_3217 TI_5ul_254 A 07_TI_5ul_254_A Intensity TI_hi 40075.0 4387 4387 4387
7 08_Slot1-1_1_3218 TI_5ul_254 B 08_TI_5ul_254_B Intensity TI_hi 33472.0 3839 3839 3839
8 09_Slot1-1_1_3219 TI_5ul_254 C 09_TI_5ul_254_C Intensity TI_hi 34678.0 3899 3899 3899
9 10_Slot1-1_1_3220 TI_5ul_254 D 10_TI_5ul_254_D Intensity TI_hi 33739.0 3828 3828 3828
10 11_Slot1-1_1_3221 TI_5ul_254 E 11_TI_5ul_254_E Intensity TI_hi 37424.0 4219 4219 4219
11 12_Slot1-1_1_3222 TI_5ul_254 F 12_TI_5ul_254_F Intensity TI_hi 34369.0 4221 4221 4221
12 13_Slot1-1_1_3196 OdT_nCL A 13_OdT_nCL_A Intensity OdT 3294.0 429 429 429
13 14_Slot1-1_1_3197 OdT_nCL B 14_OdT_nCL_B Intensity OdT 2322.0 348 348 348
14 15_Slot1-1_1_3198 OdT_nCL C 15_OdT_nCL_C Intensity OdT 3309.0 484 484 484
15 16_Slot1-1_1_3199 OdT_nCL D 16_OdT_nCL_D Intensity OdT 2605.0 386 386 386
16 17_Slot1-1_1_3200 OdT_nCL E 17_OdT_nCL_E Intensity OdT 2818.0 463 463 463
17 18_Slot1-1_1_3201 OdT_nCL F 18_OdT_nCL_F Intensity OdT 2051.0 358 358 358
18 19_Slot1-1_1_3203 OdT_254 A 19_OdT_254_A Intensity OdT 18752.0 1852 1852 1852
19 20_Slot1-1_1_3204 OdT_254 B 20_OdT_254_B Intensity OdT 21800.0 2155 2155 2155
20 21_Slot1-1_1_3205 OdT_254 C 21_OdT_254_C Intensity OdT 19116.0 1908 1908 1908
21 22_Slot1-1_1_3206 OdT_254 D 22_OdT_254_D Intensity OdT 19794.0 2032 2032 2032
22 23_Slot1-1_1_3207 OdT_254 E 23_OdT_254_E Intensity OdT 22721.0 2431 2431 2431
23 24_Slot1-1_1_3208 OdT_254 F 24_OdT_254_F Intensity OdT 21526.0 2579 2579 2579
24 25_Slot1-1_1_3159 Sil_nCL A 25_Sil_nCL_A Intensity Sil 2082.0 315 315 315
25 26_Slot1-1_1_3158 Sil_nCL B 26_Sil_nCL_B Intensity Sil 348.0 70 70 70
26 27_Slot1-1_1_3160 Sil_nCL C 27_Sil_nCL_C Intensity Sil 2090.0 312 312 312
27 28_Slot1-1_1_3161 Sil_nCL D 28_Sil_nCL_D Intensity Sil 1578.0 277 277 277
28 29_Slot1-1_1_3162 Sil_nCL E 29_Sil_nCL_E Intensity Sil 1284.0 231 231 231
29 30_Slot1-1_1_3163 Sil_nCL F 30_Sil_nCL_F Intensity Sil 166.0 55 55 55
30 31_Slot1-1_1_3170 Sil_254 A 31_Sil_254_A Intensity Sil 11065.0 1070 1070 1070
31 32_Slot1-1_1_3165 Sil_254 B 32_Sil_254_B Intensity Sil 10487.0 992 992 992
32 33_Slot1-1_1_3166 Sil_254 C 33_Sil_254_C Intensity Sil 10336.0 983 983 983
33 34_Slot1-1_1_3167 Sil_254 D 34_Sil_254_D Intensity Sil 7930.0 813 813 813
34 35_Slot1-1_1_3168 Sil_254 E 35_Sil_254_E Intensity Sil 13398.0 1472 1472 1472
35 36_Slot1-1_1_3169 Sil_254 F 36_Sil_254_F Intensity Sil 11246.0 1357 1357 1357
In [29]:
#### Plot the counts
sns.set_style('whitegrid')
jvis.BarPlotByGroup_sbplot(metaStats, x_col = 'condition', y_col = 'Intensity', title = '# Genes Detected By Group', pal = set2_paired, ylabel = 'Unique Genes',
                          errorbars = 'SEM')
<Figure size 432x288 with 0 Axes>

Results: Interesting

This experiment tested some minor changes to the RBP purification protocol. Both OdT and Silane captures were affected by:

  • Dropping pH 10.83 NaOAc (ppreviously used to drive the acidic phenol cocktails to pH 7.8)
  • Leaving pH to be determined by the DNAse rxn buffer

The silane capture also featured:

  • Lengthy formamide and Tris buffer washing

Both yielded comparatively high backgrounds on the nCL samples (c. 15-20%). In the next experiment (expt.314) we will:

  • Add unbuffered NaOAc
  • Leave pH to be determined by the DNAse rxn buffer
  • Shorten formamide and Tris buffer washing

13. Assess Replicate Correlation

Functions
jwrangle.MQ_getSliceByPrefix( )
jvis.showPearsonRegression_altair( )

The function _jwrangle.MQgetSliceByPrefix( ) provides a convenient means of extracting values of a specific group.
We can then use _jvis.showPearsonRegressionaltair( ) to perform pairwise comparisons between each member of those groups. This function is specifically applied to genes with shared intensities- genes exclusive to one sample or the other, represented by vertical or horizontal datapoints, are plotted but excluded from the pearson calculation.

In [30]:
#### Extract the intensity values as a dictionary where keys = groups
Intensity_Dict = jwrangle.MQ_getSliceByPrefix(pGroup_log2, metadata, 'Intensity', group = 'condition', add_col = None)
Intensity_Dict.keys()
Out[30]:
dict_keys(['TI_1ul_nCL', 'TI_5ul_254', 'OdT_nCL', 'OdT_254', 'Sil_nCL', 'Sil_254'])
In [31]:
#### Check replicate consistency across all within group pairs
jvis.showPearsonRegression_altair(Intensity_Dict['TI_5ul_254'], mark_color = set2_paired[2])
jvis.showPearsonRegression_altair(Intensity_Dict['OdT_nCL'], mark_color = set2_paired[4])
jvis.showPearsonRegression_altair(Intensity_Dict['OdT_254'], mark_color = set2_paired[4])
jvis.showPearsonRegression_altair(Intensity_Dict['Sil_nCL'], mark_color = set2_paired[6])
jvis.showPearsonRegression_altair(Intensity_Dict['Sil_254'], mark_color = set2_paired[6])

Results: GOOD

Pearson correlations are excellent in nearly all within group comparisons.

Conclusions

Silane
Protein returns from the Silane are low.

A. Next MS run; repeat e311 processing
Reprocess 50% of material from samples 25-36 and rerun:
If silane CCL protein numbers increase then it is a sample input issue

B. Low input: For e314 increase input to 40%.
If numbers increase for silane CCL numbers then results here for e311 are either due to low input or pH/salt/wash changes. A. above should address this.

OdT
Protein returns for OdT are as expected. This opens up the question:

  • Are the OdT cCL protein identities here the same as for Expt.304?
  • The primary difference between Expt.304 and Expt.311 was the exclusion of NaOAc. We can test this by comparing the pearson correlations of samples between both experiments. This will give a quantitative indication of any changes. Ideally we would also rerun the MQ search with the normalisation and LFQ shared (but this probably isn't necessary).
    We can test changes to protein species by calculating the average count difference between all pairwise comparisons within each group and comparing that number between groups.
    We can will then inspect all genes common within a group and do a set analysis against the other group.

Total Inputs
These were originally intended to basline both the OdT and Silane groups so that they could be compared with each other. The failure of the silane capture means further analysis will not be neccessary, suffice to say that the existing TI samples in the -80 can be reused and injected at 5ul/sample.